<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
                      "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
    <meta http-equiv="content-type" content="text/html; charset=UTF-8"/>
    <title>Zend_Search_Lucene Introduction - Zend Framework Manual</title>

    <link href="../css/shCore.css" rel="stylesheet" type="text/css" />
    <link href="../css/shThemeDefault.css" rel="stylesheet" type="text/css" />
    <link href="../css/styles.css" media="all" rel="stylesheet" type="text/css" />
</head>
<body>
<h1>Zend Framework</h1>
<h2>Programmer's Reference Guide</h2>
<ul>
    <li><a href="../en/learning.lucene.intro.html">Inglês (English)</a></li>
    <li><a href="../pt-br/learning.lucene.intro.html">Português Brasileiro (Brazilian Portuguese)</a></li>
</ul>
<table width="100%">
    <tr valign="top">
        <td width="85%">
            <table width="100%">
                <tr>
                    <td width="25%" style="text-align: left;">
                    <a href="learning.lucene.html">Iniciando com o Zend_Search_Lucene</a>
                    </td>

                    <td width="50%" style="text-align: center;">
                        <div class="up"><span class="up"><a href="learning.lucene.html">Iniciando com o Zend_Search_Lucene</a></span><br />
                        <span class="home"><a href="manual.html">Guia de Refer&ecirc;ncia do Programador</a></span></div>
                    </td>

                    <td width="25%" style="text-align: right;">
                        <div class="next" style="text-align: right; float: right;"><a href="learning.lucene.index-structure.html">Lucene Index Structure</a></div>
                    </td>
                </tr>
            </table>
<hr />
<div id="learning.lucene.intro" class="section"><div class="info"><h1 class="title">Zend_Search_Lucene Introduction</h1></div>
    

    <p class="para">
        The <span class="classname">Zend_Search_Lucene</span> component is intended to provide a
        ready-for-use full-text search solution. It doesn&#039;t require any <acronym class="acronym">PHP</acronym>
        extensions<a href="#fnid1" name="fn1"><sup>[1]</sup></a><acronym class="acronym">UTF-8</acronym><em class="emphasis">mbstring</em> or additional software to be installed, and can be used
        immediately after Zend Framework installation.
    </p>

    <p class="para">
        <span class="classname">Zend_Search_Lucene</span> is a pure <acronym class="acronym">PHP</acronym> port of the
        popular open source full-text search engine known as Apache Lucene. See <a href="http://lucene.apache.org" class="link external">&raquo; http://lucene.apache.org/</a> for the details.
    </p>

    <p class="para">
        Information must be indexed to be available for searching.
        <span class="classname">Zend_Search_Lucene</span> and Java Lucene use a document concept known as an
        &quot;atomic indexing item.&quot;
    </p>

    <p class="para">
        Each document is a set of fields: &lt;name, value&gt; pairs where name and value are
        <acronym class="acronym">UTF-8</acronym> strings<a href="#fnid2" name="fn2"><sup>[2]</sup></a>. Any subset of the document fields may be marked
        as &quot;indexed&quot; to include field data in the text indexing process.
    </p>

    <p class="para">
        Field values may or may not be tokenized while indexing. If a field is not tokenized, then
        the field value is stored as one term; otherwise, the current analyzer is used for
        tokenization.
    </p>

    <p class="para">
        Several analyzers are provided within the <span class="classname">Zend_Search_Lucene</span> package.
        The default analyzer works with <acronym class="acronym">ASCII</acronym> text (since the
        <acronym class="acronym">UTF-8</acronym> analyzer needs the <em class="emphasis">mbstring</em> extension to be
        turned on). It is case insensitive, and it skips numbers. Use other analyzers or create your
        own analyzer if you need to change this behavior.
    </p>

    <blockquote class="note"><p><b class="note">Note</b>: <span class="info"><b>Using analyzers during indexing and searching</b><br /></span>
        

        <p class="para">
            Important note! Search queries are also tokenized using the &quot;current analyzer&quot;, so the
            same analyzer must be set as the default during both the indexing and searching process.
            This will guarantee that source and searched text will be transformed into terms in the
            same way.
        </p>
    </p></blockquote>

    <p class="para">
        Field values are optionally stored within an index. This allows the original field data to
        be retrieved from the index while searching. This is the only way to associate search
        results with the original data (internal document IDs may be changed after index
        optimization or auto-optimization).
    </p>

    <p class="para">
        The thing that should be remembered is that a Lucene index is not a database. It doesn&#039;t
        provide index backup mechanisms except backup of the file system directory. It doesn&#039;t
        provide transactional mechanisms though concurrent index update as well as concurrent update
        and read are supported. It doesn&#039;t compare with databases in data retrieving speed.
    </p>

    <p class="para">
        So it&#039;s good idea:
    </p>

    <ul class="itemizedlist">
        <li class="listitem">
            <p class="para">
                <em class="emphasis">Not</em> to use Lucene index as a storage since it may dramatically
                decrease search hit retrieving performance. Store only unique document identifiers
                (doc paths, <acronym class="acronym">URL</acronym>s, database unique IDs) and associated data within
                an index. E.g. title, annotation, category, language info, avatar. (Note: a field
                may be included in indexing, but not stored, or stored, but not indexed).
            </p>
        </li>

        <li class="listitem">
            <p class="para">
                To write functionality that can rebuild an index completely if it&#039;s corrupted for
                any reason.
            </p>
        </li>
    </ul>

    <p class="para">
        Individual documents in the index may have completely different sets of fields. The same
        fields in different documents don&#039;t need to have the same attributes. E.g. a field may be
        indexed for one document and skipped from indexing for another. The same applies for
        storing, tokenizing, or treating field value as a binary string.
    </p>
<div class="footnote"><a name="fnid1" href="#fn1"><sup>[1]</sup></a><span class="para footnote">Though some  processing functionality
                requires the  extension to be turned
                on</span></div>
<div class="footnote"><a name="fnid2" href="#fn2"><sup>[2]</sup></a><span class="para footnote">Binary strings are also allowed to be used
                as field values</span></div>
</div>
        <hr />

            <table width="100%">
                <tr>
                    <td width="25%" style="text-align: left;">
                    <a href="learning.lucene.html">Iniciando com o Zend_Search_Lucene</a>
                    </td>

                    <td width="50%" style="text-align: center;">
                        <div class="up"><span class="up"><a href="learning.lucene.html">Iniciando com o Zend_Search_Lucene</a></span><br />
                        <span class="home"><a href="manual.html">Guia de Refer&ecirc;ncia do Programador</a></span></div>
                    </td>

                    <td width="25%" style="text-align: right;">
                        <div class="next" style="text-align: right; float: right;"><a href="learning.lucene.index-structure.html">Lucene Index Structure</a></div>
                    </td>
                </tr>
            </table>
</td>
        <td style="font-size: smaller;" width="15%"> <style type="text/css">
#leftbar {
	float: left;
	width: 186px;
	padding: 5px;
	font-size: smaller;
}
ul.toc {
	margin: 0px 5px 5px 5px;
	padding: 0px;
}
ul.toc li {
	font-size: 85%;
	margin: 1px 0 1px 1px;
	padding: 1px 0 1px 11px;
	list-style-type: none;
	background-repeat: no-repeat;
	background-position: center left;
}
ul.toc li.header {
	font-size: 115%;
	padding: 5px 0px 5px 11px;
	border-bottom: 1px solid #cccccc;
	margin-bottom: 5px;
}
ul.toc li.active {
	font-weight: bold;
}
ul.toc li a {
	text-decoration: none;
}
ul.toc li a:hover {
	text-decoration: underline;
}
</style>
 <ul class="toc">
  <li class="header home"><a href="manual.html">Guia de Refer&ecirc;ncia do Programador</a></li>
  <li class="header up"><a href="manual.html">Guia de Refer&ecirc;ncia do Programador</a></li>
  <li class="header up"><a href="learning.html">Conhecendo o Zend Framework</a></li>
  <li class="header up"><a href="learning.lucene.html">Iniciando com o Zend_Search_Lucene</a></li>
  <li class="active"><a href="learning.lucene.intro.html">Zend_Search_Lucene Introduction</a></li>
  <li><a href="learning.lucene.index-structure.html">Lucene Index Structure</a></li>
  <li><a href="learning.lucene.index-opening.html">Index Opening and Creation</a></li>
  <li><a href="learning.lucene.indexing.html">Indexing</a></li>
  <li><a href="learning.lucene.searching.html">Searching</a></li>
  <li><a href="learning.lucene.queries.html">Supported queries</a></li>
  <li><a href="learning.lucene.pagination.html">Search result pagination</a></li>
 </ul>
 </td>
    </tr>
</table>

<script type="text/javascript" src="../js/shCore.js"></script>
<script type="text/javascript" src="../js/shAutoloader.js"></script>
<script type="text/javascript" src="../js/main.js"></script>

</body>
</html>